Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft-deprecate the use of implicit reliance on NULL-terminated strings. #99806

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Nov 28, 2024

"MVP" of addressing godotengine/godot-proposals#11249 and godotengine/godot-proposals#11258. The motivation for this direction are discussed there (and linked issues). This PR does not yet change assumptions about user-supplied strings; NULL termination checks are only avoided for static strings supplied from the codebase.

This change is toggled with the c++-define GODOT_SHOW_STR_DEPRECATIONS (off by default). For unowned string references, StrRange can be used, which avoids allocation overhead. This is already employed in sprintf and operator+ in this PR, because there are a lot of uses of these functions.

This change is as minimal as I could realistically get it to be: ustring has quite a lot of changes, but most other files work exactly as before. This changes when GODOT_SHOW_STR_DEPRECATIONS is toggled: Suddenly, the console explodes into a flurry of warnings, because strlen is used in a place where it may be inappropriate and wastes CPU cycles.

This PR should already slightly improves performance by letting callers pass their string[] as string[] with known lengths, rather than having it recomputed immediately. Every change we will make to get rid of another GODOT_SHOW_STR_DEPRECATIONS deprecation warning will improve the engine speed further. Every function we can make refactor to use StrRange instead of String will reduce allocations and improve engine speed.

Please, participate in lively discussion! I realize merging this is kind of a large ask, especially from a pretty fresh contributor like myself. I tried to make the change as unopinionated as possible, and focused on just improving speed and conditions for future PRs to improve speed further.

Still, I personally think this direction is unavoidable sooner or later, if we want to make the best use of clock cycles for the hard parts elsewhere. Also, i have heard it mentioned a few times that the Godot source code has a side-mission as a learning tool - making computation explicit to avoid accidental promotions and data copies is very important in clean and fast code :)

@Ivorforce Ivorforce requested review from a team as code owners November 28, 2024 19:40
@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch 15 times, most recently from 1b8e3da to 1cb5c1e Compare November 28, 2024 21:29
@Ivorforce Ivorforce requested a review from a team as a code owner November 28, 2024 21:29
@Mickeon Mickeon added this to the 4.x milestone Nov 28, 2024
@Mickeon
Copy link
Contributor

Mickeon commented Nov 28, 2024

This feels somewhat related to #84680 @Repiteo

@clayjohn
Copy link
Member

Sounds interesting! Can you share the results of your profiling so we have an idea of what kind of performance differential to expect?

@Ivorforce
Copy link
Contributor Author

Sounds interesting! Can you share the results of your profiling so we have an idea of what kind of performance differential to expect?

Will do once I get everything rolling again! It's still stuck between not compiling or causing segfaults, hopefully to resolve soon :)

@clayjohn clayjohn marked this pull request as draft November 28, 2024 22:56
@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch 12 times, most recently from 691af48 to 54d429b Compare November 29, 2024 17:33
@adamscott
Copy link
Member

Hello @Ivorforce ! I hope you're doing well.

The release team found out that you were updating your PR a lot lately. Could you add a tag in your commit message when CI checks are not needed? (see https://docs.github.com/en/actions/managing-workflow-runs-and-deployments/managing-workflow-runs/skipping-workflow-runs)

It's because each time you push an update, even a tiny one, it triggers full CI checks. 😅

@Ivorforce
Copy link
Contributor Author

Yep, sorry, will do!

@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch 2 times, most recently from 8913fbd to da45a5d Compare December 1, 2024 15:56
@Ivorforce Ivorforce marked this pull request as ready for review December 1, 2024 15:59
@Ivorforce Ivorforce requested a review from a team as a code owner December 1, 2024 15:59
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 1, 2024

Tests are finally passing (locally anyway)!

I managed to salvage this PR by doing less at once. The core is still the same - implicit conversions from char * to String are soft-deprecated (and implicit conversions from `CharString and Char16String to ptr), while implicit construction from string literal to String remains legal.

The strlen calls in the 'String from String Literal' constructors are resolved at compile time, as can be seen here: https://godbolt.org/z/aTv61TT1K. I first tried to use the len template parameter, but realized that some string literals are implicitly ended with NULL while others aren't. Calling strlen at compile time should be the safest route to go, and it shows because without it the tests weren't passing 😅

All in all, this PR should slightly speed up conversion from string literal to String, and allow us to move more toward explicitness and no NULL termination reliance in Strings. When / If it gets merged, I will likely add follow-up PRs to explicitize implicit conversions throughout the codebase.

I recommend merging #99816 and #99817 before this one because this change includes the former ones, making the changelog more granular.

Edit: One test is failing, I think that's just caches though?

…icitly parse the given string to find its end. Add known-length constructors and explicit from_c_str static methods.
@Repiteo
Copy link
Contributor

Repiteo commented Dec 1, 2024

Holy shit, you have no idea how cathartic it is to see someone else take an interest in this topic. I'm 100% onboard with modernizing our strings to not rely on null-termination, especially considering \0 is a perfectly valid Unicode codepoint (for better or for worse)

@Repiteo Repiteo removed request for a team December 1, 2024 19:40
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 2, 2024

I made a synthetic performance benchmark: https://onlinegdb.com/rzGUA045Ry

This PR is not about performance, but I wanted to check anyway, since some string creators already automatically select the static length constructor with this PR. The speed gain is not very consistent (it's more consistent offline); unscientifically i would estimate it's about 9% for short length strings, 5% for medium length strings, and 4% for long length strings such as from canvas.glsl.gen.h.

@fire
Copy link
Member

fire commented Dec 2, 2024

I'll bump the build, but I don't think it's a cache thing.

drivers.windows.editor.x86_64.lib(feed_effects.windows.editor.x86_64.obj) : error LNK2019: unresolved external symbol "public: __cdecl String::String(char const *)" (??0String@@QEAA@PEBD@Z) referenced in function "public: __cdecl GLES3::FeedEffects::FeedEffects(void)" (??0FeedEffects@GLES3@@QEAA@XZ)
bin\godot.windows.editor.x86_64.exe : fatal error LNK1120: 1 unresolved externals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants